Weakly-supervised Classification

Two problems:

Label noise: label flip noise (belong to other training categories) and outlier noise (does not belong to any training category).
Domain shift: domain distribution mismatch between web data and consumer data.

Solutions:

label flip layer: [1] [2] [3]
multi-instance learning: [4] (pixel-level attention) [5] [6] [19](image-level attention)
reweight training samples: [7] [8] [9]
curriculumn learning: [10] [11]
bootstrapping: [12]
negative learning: [18]
Cyclical Training: [20]

Use auxiliary clean data:

active learning (select training samples to annotate): [13]
reinforcement learning (learn labeling policies): [14]
analogous to semi-supervised learning
- partial data with both noisy labels and clean labels as well as partial data with only noisy labels [15] [3] [7]
- partial data with noisy labels and partial data with clean labels [16] [17]

Datasets:

There are two types of label noise: synthetic label noise and web label noise.

large-scale web datasets: webvision v1, webvision v2
fine-grained web datasets: clothing, car, Stanford Dogs, Food101N, MIT indoor67, skin disease-198
synthetic noisy datasets via label flipping: CIFAR-10/100

Surveys:

A Survey of Label-noise Representation Learning: Past, Present and Future

Reference

[1] Chen, Xinlei, and Abhinav Gupta. “Webly supervised learning of convolutional networks.” ICCV, 2015.

[2] Sukhbaatar, Sainbayar, et al. “Training convolutional networks with noisy labels.” arXiv preprint arXiv:1406.2080 (2014).

[3] Xiao, Tong, et al. “Learning from massive noisy labeled data for image classification.” CVPR, 2015.

[4] Zhuang, Bohan, et al. “Attend in groups: a weakly-supervised deep learning framework for learning from web data.” CVPR, 2017.

[5] Wu, Jiajun, et al. “Deep multiple instance learning for image classification and auto-annotation.” CVPR, 2015.

[6] Ilse, Maximilian, Jakub M. Tomczak, and Max Welling. “Attention-based deep multiple instance learning.” arXiv preprint arXiv:1802.04712 (2018).

[7] Lee, Kuang-Huei, et al. “Cleannet: Transfer learning for scalable image classifier training with label noise.” CVPR, 2018.

[8] Liu, Tongliang, and Dacheng Tao. “Classification with noisy labels by importance reweighting.” T-PAMI, 2015.

[9] Misra, Ishan, et al. “Seeing through the human reporting bias: Visual classifiers from noisy human-centric labels.” CVPR, 2016.

[10] Guo, Sheng, et al. “Curriculumnet: Weakly supervised learning from large-scale web images.” ECCV, 2018.

[11] Jiang, Lu, et al. “Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels.” arXiv preprint arXiv:1712.05055 (2017).

[12] Reed, Scott, et al. “Training deep neural networks on noisy labels with bootstrapping.” arXiv preprint arXiv:1412.6596 (2014).

[13] Krause, Jonathan, et al. “The unreasonable effectiveness of noisy data for fine-grained recognition.” ECCV, 2016.

[14] Yeung, Serena, et al. “Learning to learn from noisy web videos.” CVPR, 2017.

[15] Veit, Andreas, et al. “Learning from noisy large-scale datasets with minimal supervision.” CVPR, 2017.

[16] Xu, Zhe, et al. “Webly-supervised fine-grained visual categorization via deep domain adaptation.” T-PAMI, 2016.

[17] Li, Yuncheng, et al. “Learning from noisy labels with distillation.” ICCV, 2017.

[18] Kim, Youngdong, et al. “Nlnl: Negative learning for noisy labels.” ICCV, 2019.

[19] “MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition”, CVPR, 2019.

[20] Huang, Jinchi, et al. “O2u-net: A simple noisy label detection approach for deep neural networks.” ICCV, 2019.